Search CORE

33 research outputs found

Predictive caching and prefetching of query results in search engines

Author: Ronny Lempel
Shlomo Moran
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2003
Field of study

We study the caching of query result pages in Web search engines. Popular search engines receive millions of queries per day, and ecient policies for caching query results may enable them to lower their response time and reduce their hardware requirements. We present PDC (probability driven cache), a novel scheme tailored for caching search results, that is based on a probabilistic model of search engine users. We then use a trace of over seven million queries submitted to the search engine AltaVista to evaluate PDC, as well as traditional LRU and SLRU based caching schemes. The trace driven simulations show that PDC outperforms the other policies. We also examine the prefetching of search results, and demonstrate that prefetching can increase cache hit ratios by 50% for large caches, and can double the hit ratios of small caches. When integrating prefetching into PDC, we attain hit ratios of over 0:53.

CiteSeerX

Crossref

Distributed Exploration in Multi-Armed Bandits

Author: Hillel Eshcar
Karnin Zohar
Koren Tomer
Lempel Ronny
Somekh Oren
Publication venue
Publication date: 04/11/2013
Field of study

We study exploration in Multi-Armed Bandits in a setting where

k

players collaborate in order to identify an

\epsilon

-optimal arm. Our motivation comes from recent employment of bandit algorithms in computationally intensive, large-scale applications. Our results demonstrate a non-trivial tradeoff between the number of arm pulls required by each of the players, and the amount of communication between them. In particular, our main result shows that by allowing the

k

players to communicate only once, they are able to learn

\sqrt{k}

times faster than a single player. That is, distributing learning to

k

players gives rise to a factor

\sqrt{k}

parallel speed-up. We complement this result with a lower bound showing this is in general the best possible. On the other extreme, we present an algorithm that achieves the ideal factor

k

speed-up in learning performance, with communication only logarithmic in

1/\epsilon

arXiv.org e-Print Archive

CiteSeerX

Budget-Constrained Item Cold-Start Handling in Collaborative Filtering Recommenders via Optimal Design

Author: Anava Oren
Golan Shahar
Golbandi Nadav
Karnin Zohar
Lempel Ronny
Rokhlenko Oleg
Somekh Oren
Publication venue
Publication date: 20/09/2016
Field of study

It is well known that collaborative filtering (CF) based recommender systems provide better modeling of users and items associated with considerable rating history. The lack of historical ratings results in the user and the item cold-start problems. The latter is the main focus of this work. Most of the current literature addresses this problem by integrating content-based recommendation techniques to model the new item. However, in many cases such content is not available, and the question arises is whether this problem can be mitigated using CF techniques only. We formalize this problem as an optimization problem: given a new item, a pool of available users, and a budget constraint, select which users to assign with the task of rating the new item in order to minimize the prediction error of our model. We show that the objective function is monotone-supermodular, and propose efficient optimal design based algorithms that attain an approximation to its optimum. Our findings are verified by an empirical study using the Netflix dataset, where the proposed algorithms outperform several baselines for the problem at hand.Comment: 11 pages, 2 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

Torts

Author: Andrei Z. Broder
Farzin Maghoul
Jan Pedersen
Ronny Lempel
Publication venue: UNM Digital Repository
Publication date: 01/01/1982
Field of study

Crossref

PicASHOW: Pictorial authority search by hyperlinks on the web

Author: Ronny Lempel
Publication venue
Publication date
Field of study

We describe PicASHOW, a fully automated WWW image retrieval system that is based on several link-structure analyzing algorithms. Our basic premise is that a page p displays (or links to) an image when the author of p considers the image to be of value to the viewers of the page. We thus extend some well known link-based WWW page retrieval schemes to the context of image retrieval. PicASHOW’s analysis of the link structure enables it to retrieve relevant images even when those are stored in files with meaningless names. The same analysis also allows it to identify image containers and image hubs. We define these as Web pages that are rich in relevant images, or from which many images are readily accessible. PicASHOW requires no image analysis whatsoever and no creation of taxonomies for preclassification of the Web’s images. It can be implemented by standard WWW search engines with reasonable overhead, in terms of both computations and storage, and with no change to user query formats. It can thus be used to easily add image retrieving capabilities to standard search engines. Our results demonstrate that PicASHOW, while relying almost exclusively on link analysis

CiteSeerX

Competitive Caching of Query Results in Search Engines

Author: Ronny Lempel
Shlomo Moran
Publication venue
Publication date: 01/01/2003
Field of study

We study the problem of caching query result pages in Web search engines. Popular search engines receive millions of queries per day, and for each query, return a result page to the user who submitted the query. The user may request additional result pages for the same query, submit a new query, or quit searching altogether. An efficient scheme for caching query result pages may enable search engines to lower their response time and reduce their hardware requirements. This work studies query result caching within the framework of the competitive analysis of algorithms. We define a discrete time stochastic model for the manner in which queries are submitted to search engines by multiple user sessions. We then present an adaptation of a known online paging scheme to this model. The expected number of cache misses of the resulting algorithm is no greater than 4 times the expected number of misses that any online caching algorithm will experience under our specific model of query generation

CiteSeerX

Elsevier - Publisher Connector